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Parallel  Scheduling  Algorithms* 
Uliezer  Dekel  and  Sartaj  Sahni 
University  of  Minnesota 


Abstract: 

/Ve  obtain  fast  parallel  algorithms  for  several  scheduling 
problems.  Some  of  the  problems  considered  are:  scheduling 
to  minimize  tne  number  of  tardy  jobs?  job  sequencing  with 
deadlines?  scheduling  to  minimize  earliness  and  tardiness 
penalties?  channel  assignment?  and  minimizing  the  mean  fin¬ 
ish  time.  The  shared  memory  model  of  parallel  computers  is 
used . 
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Introduction 


1 . 


Witn  the  continuing  dramatic  decline  in  the  cost  of 
hardware,  it  is  becoming  feasible  to  economically  build  com¬ 
puters  with  thousands  of  processors.  In  fact,  Batcher  [3] 
describes  a  computer  (MPP)  with  16,334  processors  that  is 
currently  being  built  for  NASA.  In  coming  years,  one  can 
expect  to  see  computers  with  a  hundred  thousand  or  even  a 
million  processing  elements.  This  expectation  ha3  motivated 
tne  study  of  parallel  algorithms. 

Since  the  complexity  of  a  parallel  algorithm  depends 
very  mucn  on  the  architecture  of  the  parallel  computer  on 
whicn  it  is  run,  it  is  necessary  to  keep  the  architecture  in 
mind  when  designing  the  algorithm.  Several  parallel  archi¬ 
tectures  have  been  proposed  and  studied.  In  this  paper,  we 
shall  deal  directly  only  with  the  single  instruction  stream, 
multiple  data  stream  (SIMD)  model.  Our  techniques  and  algo¬ 
rithms  readily  adapt  to  the  other  models  (eg:  multiple 
instruction  stream  multiple  data  stream  (MIMD)  and  data  flow 
models).  SIMD  computers  have  the  following  characteristics: 

(1)  They  consist  of  p  processing  elements  (PSs).  The  P£s 
are  indexed  21,1,  ...,  p-1  and  an  individual  PE  may  be 
referenced  as  in  P£(i).  Each  PE  is  capable  of  perform¬ 
ing  the  standard  arithmetic  and  logical  operations.  In 
addition,  each  PE  knows  its  index. 

(2)  Each  PE  has  some  local  memory. 

(3)  The  PEs  are  synchronized  and  operate  under  the  control 
of  a  single  instruction  stream. 

(4)  An  anable/disable  mask  can  be  used  to  select  a  subset 
of  the  PEs  that  are  to  perform  an  instruction.  Only  the 
enabled  PEs  will  perform  the  instruction.  Tne  remain¬ 
ing  PEs  will  be  idle.  All  enabled  PEs  execute  the  same 
instruction.  The  set  of  enabled  PEs  can  change  from 
instruction  to  instruction. 


While  several  SIMD  models  have  been  proposed,  in  this 
paper  we  snail  deal  explicitly  with  only  the  shared  memory 
model  (SMM).  In  tnis  model,  tnere  is  a  large  common  memory 
tnat  is  shared  by  all  the  PEs.  It  is  assumed  that  any  PE 
can  access  any  word  of  this  common  memory  in  0(1)  time. 
When  two  or  more  PEs  access  the  same  word  simultaneously,  we 
snail  say  that  a  conflict  has  occured .  If  all  tne  PEs  (at 
least  two)  that  simultaneously  access  tne  same  word  wisn  to 
write  in  it,  it  is  called  a  write  conflict .  If  all  wish  to 
read,  then  it  is  a  read  conflict^  Write  conflicts  may  be 
permitted  so  long  as  all  the  PEs  wish  to  write  tne  same 
piece  of  information.  As  far  as  our  discussion  here  is  con¬ 
cerned,  no  read  or  write  conflicts  are  allowed.  A 


& 
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description  of  some  of  tne  otner  SIMD  models  can  be  found  in 
C  5  ]  • 


Most  algorithmic  studies  of  parallel  computation  have 
oeen  based  on  the  SMM  (1,2,4,7,10,11,19,21,22).  Parallel 
matrix  and  graph  algorithms  for  the  SMM  have  been  developed 
by  Agerwala  and  Lint  [1],  Arjomandi  [2],  Csanky  [4],  Ecx- 
stein  [7],  Hirschberg,  Chandra,  and  Sarwate  [10],  and  Savage 
[22].  Hirschberg  [11],  Muller  and  Preparata  [19],  and 
Preparata  [21]  have  considered  tne  sorting  problem  for  SMMs . 
Tne  results  of  Muller  and  Preparata  [19]  and  Preparata  [21] 
will  be  made  use  of  in  this  paper.  In  these  two  papers,  i| 
is  shown  that  n  numbers  can  be  sorted  in  O(logn)  time  if  n 
PEs  are  available,  and  in  O(log  n)  time  when  n  PEs  are 
available . 

Dexel  and  Sahni  [6]  develop  a  design  technique  for 
parallel  algorithms  that  is  based  on  binary  computation 
trees.  This  design  technique  is  illustrated  using  several 
examples  from  scheduling  tneory.  Some  of  tne  scheduling 
problems  considered  by  them  are: 

Pi:  Schedule  many  machines  to  minimize  maximum  lateness 

when  all  jobs  nave  a  processing  time  p^=l  • 

P2 :  Scnedula  one  macnine  to  minimize  maximum  lateness. 

Preemptions  are  permitted. 

P3:  Schedule  one  machine  to  minimize  the  number  of  tardy 

jobs . 

P4:  The  job  sequencing  with  deadlines  problem. 


The  complexity  of  £heir  parallel  algorithms  for  all  the 
above  problems  is  O(log  n). 

When  measuring  the  effectiveness  of  a  parallel  algo¬ 
rithm,  one  needs  to  consider  both  its  complexity  as  well  as 
its  cost  in  terms  of  the  number  of  PEs  used.  The  effective¬ 
ness  of  processor  utilization  (EPU)  is  defined  with  respect 
to  a  parallel  algorithm  and  the  fastest  known  sequential 
(i.e.,  single  processor)  algorithm  for  the  same  problem. 
Let  P  be  a  problem  and  A  a  parallel  algorithm  for  P.  We 
define : 

EPU  ( P ,  A)  =* 

complexity  of  the  fastest  sequential  algorithm  for  P 
number  of  PE3  used  by  A  *  complexity  of  A 

Tne  algorithm  of  [5]  for  problem  PI  aoove  uses  n/2  PEs 
and  nas  a  complexity  of  O(Iog'n).  The  fastest  sequential 
algoritnm  known  for  thi3  problem  is  due  to  Horn  [4]  and  runs 
in  0(n  logn)  time.  So,  tne-EPU  of  tne  parallel  algoritnm  of 
[6]  for  PI  is  0(  nlogn/  { nlog  »n) )  *  O(l/logn). 
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The  best  EPU  one  can  hope  for  is  0(1).  Few  parallel 
algorithms  achieve  this  EPU.  Dekel  and  Sahni  [6]  present 
some  algorithms  that  do.  One  that  we  snail  need  here  is  for 
the  partial  sums  proolem.  We  are  given  n  numbers 

al'a2'‘‘*'an  an<^  are  res3u^r‘3^  to  compute  A^  =  ©  a^ ,  l_<j^n, 

where  ©  is  any  asscoiative  operator  (eg.  maxT^min,  +  ,  *). 
Their  algorithm  run  in  O(logn)  time  and  uses  n/logn  PEs . 

In  this  paper,  we  consider  several  scheduling  problems. 
Fast  parallel  algorithms  are  obtained  for  each.  In  each 
case,  the  complexity  analysis  is  carried  out  on  the  assump¬ 
tion  tnat  as  many  PE3  as  needed  are  available.  This  is  in 
conformance  witn  tne  assumption  made  in  almost  all  the 
research  work  done  on  parallel  computing.  This  assumption 
is  of  course  unrealistic.  A  parallel  algorithm  will  eventu¬ 
ally  be  run  on  a  machine  with  a  finite  number  (say  k)  of 
PEs.  It  should  be  easy  to  see  that  all  our  algorithms  are 
easily  adapted  to  tne  case  of  k  PEs.  If  our  algorithm  has 
complexity  0(g(n))  using  f(n)  PEs,  then  with  k  PEs,  k  < 
f(n),  its  complexity  is  0( g{ n) f ( n) /k) .  We  shall  continue 
with  tradition,  and  explicitly  analyse  our  algorithms  only 
for  the  case  when  as  many  PEs  as  needed  are  available. 

In  Sections  2  and  3,  we  consider  two  relatively  simple 
examples.  The  first  of  these  is  to  minimize  the  finish  time 
when  m  identical  machines  are  available.  The  second  example 
is  to  minimize  the  mean  finish  time  when  m  uniform  machines 
are  available.  In  Sections  4,  5,  6,  and  7,  we  respectively, 
consider  the  following  problems: 

(i)  minimize  the  number  of  tardy  jobs  when  p^=l 
l<_i<_n  and  1  machine  is  available. 

(ii)  job  sequencing  with  deadlines. 

(iii)  schedule  one  mahcine  to  minimize  the  maximum 
earliness  and  tardiness  penalties. 

and 

(iv)  channel  assignment. 


2.  Minimum  Finish  Time 


When  preemptions  are  permitted,  a  minimum  finish  time 
schedule  for  m  machines  is  efficiently  obtained  using  Me 
Naughton’s  rule  [17].  Let  p^p.,...,?  De  the  processing 
times  of  the  n  job3.  The  finish  time,  f,  of  an  optimal 
preemptive  schedule  is  given  by: 

1  n 

f  =  max{  max  {p. } ,-  I  p ■ } 

1 < i< n  1  rai=l  x 


Using  f,  tne  optimal  schedule  may  oe  constructed  in 
0(n)  time  [17].  Job  1  is  scheduled  on  machine  1  from  <J  co 
p^  and  job  2  from  p^  no  min{ p^+p^ , f } .  If  P]^  +  ?2  >  ^ «  then 
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the  remainder  of  job  2  is  done  on  macnine  2  starting  at  time 
■3.  If  p^  +  p2  <_£,  tnen  job  3  is  scheduled  on  macnine  1  from 
p^+p2  to  rain l +  p2  +  , f } ;  etc. 


n 


Using  tne  parallel  algorithms  of  £6 ],  maxfp^j  and  I  p^ 
may  be  computed  in  O(logn)  time  witn  jji/logn  PEs.  To  obtain 
tne  actual  schedule,  we  also  need  A. =  >  p.,  l<i<n.  As  men- 

-j  J 

tioned  in  Section  1,  all  the  As  canJoe  computed  in  O(logn) 
time  using  n/logn  PEs  ([6]).  Let  A0=0.  Eacn  job  i  can  now 
determine  its  own  processing  assignment  by  using  the  follow¬ 
ing  rule: 


x  <-  T A±  ,/fl  *  f  -  Ai_ 

case  _ 

:x=0  :  schedule  job  i  on  machine  pA^/f-! 
:x>p.:  schedule  job  i  on  machine  I  A^/f~i 
1  f-x  to  f-x+p^ 

:else:  schedule  job  i  on  machine  T A^/f~l 
Pi-x 

end  case 


from  3  to  p . 
from 

from  0  to 


One  may  verify  that  x  gives  the  amount  of  processing 
time  left  on  tne  machine  P A^_, /f~l  after  job  i-1  is  fin- 
isned  on  that  machine.  'L’“ 


Example  <^.1_:  Suppose  we  nave 
as  given  in  Figure  2.1.  Let 
are  2.1  gives  the  A^  and  x 
schedule  ootainad  is  given  in 


14  jobs  with  processing  times 
m=5 .  f=max[7,  50/5} =13.  Fig- 
values  for  each  job.  The 
Figure  2.2.[] 


Job 

1 

2 

3 

— 

5 

6 

7 

3 

9 

10 

11 

12 

13 

14  ‘ 

?i 

5 

3 

1 

;  2 

7 

4 

1 

5 

n 

tm 

4 

6 

1 

6 

i 

3  1 

Ai 

5 

3 

9 

ii 

i 

18 

22 

23 

28 

30 

34 

40 

•  1 

-X 

*7 

50 

X 

0 

5 

9 

3 

- 

2 

0 

6 

0 

9 

i 

3  i 

Figure  2_._1 


If  we  have  n  ?S3 ,  all  the  machine  assignments  can  be 
computed  in  0(1)  time.  However,  using  only  n/logn  PEs, 
tnese  assignments  may  be  obtained  in  O(logn)  time  (i.e., 
eacn  PE  computes  at  most  PlognP  assignments).  So,  the 
overall  scneduling  algoritnm  has  a  complexity  of  O(logn)  and 
uses  n/logn  PEs.  So,  its  SP:J  is  0(  n,  (  iogn*n.  logn)  )  =0(  1 )  . 
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3_.  Minimum  Mean  Finish  Time 

A  non-preemptive  schedule  that  minimizes  the  mean  finish 
time  of  n  jobs  on  m  identical  machines  is  obtained  by  using 
the  LPT  rule.  3y  simply  using  a  parallel  sorting  algorithm, 
tnis  schedule  may  be  obtained  in  O(logn)  time  witn  n  PEs  or 
in  0( log  n)  time  with  n  PEs • 

Let  us  consider  the  case  of  m  uniform  parallel 
machines.  Associated  with  machine  i  is  a  speed  s- .  It 
takes  machine  i,  p^/s^  time  units  to  complete  the  processing 
of  job  i.  Horowitz  and  Sahni  [13]  present  an  O(nlogmn) 
algorithm  that  constructs  a  minimum  mean  finish  time 
schedule  for  this  case.  Their  algorithm  is  reproduced  in 
Figure  3.1.  This  algorithm  assumes  that  the  speeds  and  pro¬ 
cessing  times  have  been  normalized  and  sorted  such  that 
Si=l£s2  < • • •  £sm  and  Pl<P2< • • .£?„• 

3y  examining  thi3  algorithm,  we  sea  that  another  way  to 
obtain  an  optimal  schedule  is  to  sort  the  am  numbers  i/s., 
l£i£n,  l£j£m  into  nondecreasing  order.  Let  the  resulting 
sequence  be  a.,  a£»  a^/  a  .  If  a^  corresponds  to 
q/s.,  then  job  n+l-i  is  scheduled  onmachine  j  and  there  are 
q-l^jobs  following  it  on  that  machine . 
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Thi3  information  may  be  ODtained  in  O(log  mn)  time 
using  a  parallel  sort  and  mn  PSs  or  in  O(logmn)  time  with  :n 
a  PEs.  If  we  use  the  former  sort  algor^tnm,  the  EPU  of  our 
parallel  algorithm  is  3(  nlogmn/  ( mn*  log  .nn)  )  =0(  1/ (mlogmn)  )  . 
If  the  latter  sort  algorithm  is  used,  the.  EPU  of  our 
scheduling  algorithm  becomes  O(nlogmn/ (rtrn^logmn)  )  = 

0(1/ (m  n)).  The  actual  start  and  finisn  times  for  each  job 
can  be  obtained  by  later  using  the  partial  sums  algorithm  of 
C'o]. 


Algorithm  MFT 


Input:  m  processors  with  speeds  1,  s^,  ....  sm» 

1<_S2_5_-  •  •£sm’  n  jobs  initially  sorted  so  tnat 
^li.P2—  *  '  ‘i.Pn  ^nersS  the  times  are  for  pro¬ 
cessor  1 . 

Output:Sets  R.  l£i£m.  Tne  jobs  in  R.  are  to  be  run 

on  processor  i  in  increasing  1order  of  tneir 
execution  times. 
for  j  < —  1  to  m  -  1  do 
R;  Pi  <—  1/s  • 
end  for  J  -1 

R  < —  In};  i  <—  2/s 

/^Uote  that  ‘She  above111  assigns  the  joo  with  tne 
largest  processing  time  to  the  fastest  processor, 
m.// 

for  k  < —  n  -  1  to  1  do 

Let  u  be  the  largest  index  sucn  that  i  =  min  {i.} 

u  ,  v 

i  < —  l  +  1/  s  —  — 

end  fo* 


end  MFT 


Figure  3.1 


Humber  of 


Jobs 


Let  J={  ( p .  ,  d^ )  |  l_<i£n}  define  a  set  of  n  jobs  p^  is  the  pro¬ 
cessing  time  of  job  i  and  d.  is  its  due  time.  Let  3  be  any 
one  machine  schedule  for  J.  lJoD  i  is  tardy  in  the  schedule 
S  iff  it  completes  after  its  due  time  d^. 

Hodgson  and  Moore  [18]  nave  developed  an  O(nlogn) 
sequential  algoritnm  that  obtains  a  schedule  tnat  minimizes 
tne  number  of  tardy  jobs.  DeKel  and  Sahni  L&J  present  an 
O(log  n)  parallel  algorithm  to  obtain  a  scnadule  with  the 
fewest  number  of  tardy  jobs.  This  algorithm  uses  0(n)  PEs 
and  has  an  EP'J  of  O(l/logn). 

In  this  section,  we  shall  develop  a  parallel  algoritnm 
for  the  case  whan  p.=l,  l£i<n.  This  algorithm  will  have  a 
complexity  O(logn).  l£  will  require  O(n^)  PEs  and  thus  nave 
an  EPU  that  is  0(l/n).  tfhile  the  algorithm  of  this  section 
has  an  EPU  that  is  inferior  to  that  of  [6],  it  is  faster  by 
a  logn  factor.  It  is  interesting  to  note  that  the  simplifi¬ 
cation  Pi=l»  l£i£n  does  not  lead  to  a  corresponding  speed  up 
for  tne  sequential  case. 


The  problem  of  finding  a  schedule  tnat  minimizes  tne 
number  of  tardy  jobs  is  equivalent  to  that  of  selecting  a 
maximum  cardinality  subset  U  of  J  such  that  every  job  in  U 
can  be  completed  by  it3  due  time.  Jobs  not  in  U  can  be 
scheduled  after  those  in  U  and  will  oe  tardy.  A  set  of  jobs 
U  such  tnat  every  job  in  U  can  be  scheduled  to  complete  by 
its  due  time  i3  called  a  feasible  set .  It  is  well  known 
that  a  set  of  jobs  U  is  feasible  iff  scheduling  jobs  in  U  in 
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nondecraasing  order  of  due  times  results  in  no  tardy  jobs 
( see  [14]  for  eg . ) . 

When  p.=l,  l<_i£n,  a  maximum  cardinality  feasible  set  U 
can  be  obtained  by  considering  the  jobs  in  nondecreasing 
order  of  due  times.  Tne  job  j  currently  being  considered 
can  be  added  to  U  iff  | U 1 < d .  •  Procedure  FEAS(J,b)  is  a 
slignt  generalization.  It  finds^a  maximum  subset  of  J  that 
can  be  scheduled  in  the  interval  [0,b].  DONE(i)  is  set  to  -1 
if  job  i  is  not  selected  and  is  set  to  a  number  greater  than 
0  otherwise.  If  DONE(i)  >  3,  than  job  i  is  to  be  scheduled 
from  DONS(i)  -  1  to  DONE(i).  The  procedure  itself  returns  a 
value  that  equals  the  number  of  jobs  selected.  The  correct¬ 
ness  of  FEAS  is  easily  established  using  an  exchange  argu¬ 
ment.  Its  complexity  is  O(nlogn)  as  it  takes  this  much  time 
to  order  the  jobs  by  due  time. 

line  Procedure  FEAS ( J , n , b ) 

77 select  a  maximum  number  of  jobs  for  processing// 
//in  [0,b]  n= I J I / / 

1  set  J ;  integer  n , b ?  global  DONE ( 1 : n ) 

2  sort  J  into  nondecreasing  order  of  due  times 

3  DONE ( 1 : n )  <—  -1  //initialize// 

4  j  «—  '3 

5  for  i  1  to  n  do 


6 

case 

7 

: j > b :  return(j) 

//interval  full// 

3 

: j  <d^ :  //select 

i//  3  < —  j-t-1  y  DONE (  i )  < —  j 

9 

end  case 

10 

end  for 

11 

return ( j ) 

12 

end  FEAS 

Figure  4 . 1_ 


Let  J  be  a  set  of  n  unit  processing  time  jobs.  Let 
D(i),  l<_i<_k  be  the  distinct  due  times  of  the  jobs  in  J. 
Assume  that  D(i)  <  D(i+1),  l£i£k.  Let  n(i)  be  the  number  of 
jobs  in  J  with  due  time  D(i),  1  <^i<^k .  Clearly,  £  n(i)=n. 
Let  D(0)=0  and  n(0)=0.  Define  F(i)  to  be  the  value  of  j 
when  procedure  FEAS  (Figure  4.1)  has  just  finished  consider¬ 
ing  all  joos  in  J  witn  due  time  at  most  D. .  It  is  evident 
that: 


F(0)  =  D(3)  =  0 

(4.1) 

F(i)  =  min{F( i-L )+n( i ) , D( i ) , b} ,  l<_i<k 

Expanding  the  recurrence  (4.1),  we  oDtain: 

F( 1 )  =  min(D(3)+N(l),  D(l),  b} 

F ( 2 )  =  min{ F( 1 )+n( 2 ) ,  D(2),  b} 

=  mini D(0 )+n( 1 )+n( 2 ) , D( 1 )+n( 2 ) , b+n( 2 ),D(2),bj 
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=  min{D(3 )+n( 1 )+n( 2 ) , D( i )+n( 2 ) , D( 2 ) , d} 

F{  3  )  = 

min { D(  3 ) +  n( 1 ) +n( 2 ) +n( 3 ) , D( I ) +n ( 2 ) +n( 3 ) , D( 2 ) +n( 3 ) , D( 3 ) , d } 


And,  in  general 

ra 

(4.2)  F(m)  =  min(  min  {D(i)+  2  n(q)},'o}  d <m<_k 

l<_i<_m  q=i+l 

The  maximum  number  of  joos  in  J  mat  can  be  scnedulad 
in  L0,b],  b>3,  so  tnat  none  is  tardy  is  F(k).  F(k)  may  be 

efficiently  computed,  in  parallel  as  follows.  Let  tne  due 
times  of  the  n  jobs  in  J  be  d(l),  d ( 2 ),..., d ( n) .  Let 
d(J)=3.  We  may  assume  that  d(i)>3,  l^i<_n.  The  computation 
steps  are: 

sort  d(l:n)  into  nondacreasing  order, 
determine  the  points  r(3),  ...,  r(k-l)  in  d(3:n) 

whara  the  due  times  change  I.e.  r(i)  < 
r(i+l),l<i<k  and  d(r(i))  ^  d(r(i)+l).  Let 

r(k)=n.  Clearly,  r(0)=0,  and  n( i ) =r ( i ) -r ( i-1 ) 
and  D(i)  =  d(r^i)),  1  j<^i <_ic ;  D(3)=a. 

since  D(i)  +  >  n(q)  =  D ( i ) +n-r ( i ) , we 

compute  F(k)=4in{n+  min  { D ( i ) -r ( i ) } , b}  [] 

3<_i<_k 

Example  4.1_  Figure  4.2(a)  gives  the  dua  times  of  a  sat  J  of 
15  jobs.  in  Figure  4.2(b),  the  jobs  have  been  ordered  by 
due  times.  The  points  at  whicn  the  due  times  change  are 
shown  by  heavy  lines.  We  see  that  k=6 ;  r(0:S)  =  (3,  3,  7, 
3,  9,  13,  15)  and  D(0:6)  =  (3,  2,  3,  5,  8,  9,  11).  So, 


n+  min  { D ( i ) -r ( i ) J 

=  15  +  min{3,  -1, 

-4,  -3, 

-1,  -4,  -4}  = 

J<  i<x 

15-4=11 .  If  b  >  11, 

then  the  maximum 

number 

of  nontardy 

jobs  is  11  .  C] 

2 

With  n  PEs,  step  1  can  be  carried  out  in  O(logn)  time, 
(see  Cl9]  and  [21J).  Using  n-1  PEs,  the  boundary  points  can 
be  found  in  0(1)  time.  PE(i)  simply  checks  to  sae  if 
d(i)<d(i+l),  l£i<_n-l  .  If  so,  then  i  is  a  boundary  point.  3 
and  n  are  also  boundary  points.  The  boundary  points  have 
now  to  be  moved  into  memory  positions  r ( 3 ), r( 1 ),..., r ( k ) . 
Tnis  can  be  done  in  O(logn)  time  using  n  PEs  and  the  data 
concentration  algorithm  of  [23].  Another  data  concantration 
stap  moves  d(r(3)),  d(r(l)),  ...,  d(r(k))  into  D(3),  D(l), 
...,  D(k).  Using  .<+1  PEs,  D(i)-r(i),  3^i<_k  can  be  computed 
in  0(1)  time.  min{ D( i ) -r ( i ) }  can  be  obtained  in  O(logk) 
time  using  the  binary  tree  computation  model  of  [6]  (Figure 
4.3  shows  this  for  our  example.)  As  explained  in  [6],  only 
j(  '.</ log<)  PEs  are  needed  for  this;  but  using  k/2  PEs  is 


this  is  adequate.  To  obtain  the  actual  schedule,  we  may 
proceed  as  follows.  First,  modify  procedure  FEAS  by  adding 
the  line: 

3.1  :else:  DONE(i)  <—  j 
and  by  delating  line  7. 

It  is  easy  to  see  that  job  i  is  completed  at  time 
DONE ( i )  iff  DONE ( i )  <p  and  DONE(i-l)  £  DONE ( i ) ,  l<i<n.  For 
the  modified  algorithm,  we  see  that: 

DONE(d)  =  <3 

(4.3)  DONE  (  i )  =  min { DONE { i-1 )  +  l , d  ,  l<_i<_n 

Solving  (4.3),  we  obtain:  J" 

(4.4)  DONE(i)  =  min  {d.+i-j},  l_^i£n 

3  <_j  <_i  3 

2 

DONE(i),  ^<i<^n  may  be  computed  in  O(logn)  time  using  n 
PEs  ( tnoug'n  n  /  logn  are  sufficient)  and  tne  binary  computa¬ 
tion  tree  model  (see  [6]  and  Figure  4.3).  2Since  the  initial 
sort  takes  O(logn)  time  and  requires  n  PES,  the  overall 
time  complexity  is  O(logn)  and  the  EPU  is  0(l/n).  From 
DONE ( i ) ,  the  schedule  is  easily  obtained. 

Example  4.2:  For  the  sorted  data  of  Example  4.1,  we  obtain 
DONE  ( 3 : 15 )  =  ( 0 ,  1,  2,  2,  3,  3,  3,  3,  4,  5,  6,  7,  3,  9,  12, 
11).  So  the  set  of  non  tardy  jobs  in  [3,b],  b^l 1  is  {2,  4, 
1,  3,  11,  5,  3,  12,  14,  7,  9}.  By  concentrating  these  to 

the  left,  we  obtain  the  permutation  (2,  4,  1,  3,  11,  5,  3, 

12,  14,  7,  9,  15,  6,  13,  13)  which  represents  an  optimal 

scnedule.  [] 


5_.  Job  Sequencing  With  Deadlines 

In  tnis  problem,  we  are  given  a  sat  J  of  n  joDS.  Associated 
with  job  i  is  a  profit  z  ^  and  a  due  time  d^,  l£i<_n.  Every 
joo  na3  a  processing  requirement  of  one  unitT  If  job  i  is 
completed  by  time  d.,  then  a  profit  z.  ,z>3  is  made.  If 
joo  i  is  not  completed1by  the  time  d . ^  tAen  notning  is 
earned.  We  wisn  co  select  a  feasible  subset  of  J  that 
yields  maximum  return  (recall  thac  R  is  a  feasible  subset 
iff  all  jobs  in  R  can  be  scheduled  to  complete  on  time) . 

One  way  to  find  a  feasible  subset  R  of  J  tnat  gives 
maximum  return  is: 
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Step  1:  sort  J  into  nonincreasing  order  of  z^ 
Step  2 :  *1  «—  { 1 } 

for  i  <r—  2  to  n  do 

if  R  U ( i }  is  feasible  tnen  R«— RU(i} 
end  for 


Figure  _5.1_ 


A  correctness  proof  of  the  above  procedure  may  be  found 
in  [14].  It  is  also  possible  to  implement  the  above  scheme 
by  a  sequential  algorithm  of  complexity  O(nlogn) .  For  the 
parallel  version,  we  reduce  the  job  sequencing  with  dead¬ 
lines  problem  into  2n  independent  feasibility  problems. 
First,  we  note  that  if  R1  and  R2  are  feasible  subsets  of  J 
and  if  R1  is  one  with  maximum  return,  then  |R2|<_|Rl|. 

Theorem  5^_1:  Let  A  be  a  feasible  subset  of  J  that  yields 
maximum  return.  Let  B  be  any  feasible  subset  of  J. 
|B| <|A| . 

Proofs  Since  A  and  B  are  feasible  subsets,  they  can  respec¬ 
tively  be  scheduled  in  [3,|A|]  and  [3,  |3|]  in  such  a  manner 
that  no  job  is  tardy.  Consider  such  a  scheduling  SA  of  A 
and  SB  of  3.  Consider  a  job  i  that  is  in  both  A  and  B.  If  i 
is  scheduled  earlier  in  SA  than  in  SB,  we  may  change  SA  by 
moving  i  to  tne  slot  it  is  scheduled  in  B.  This  would 
require  moving  the  job  (if  any)  scheduled  in  this  slot  in  SA 
to  the  position  previously  occupied  by  i  (see  Figure 
5.2(a)).  A  similar  transformation  may  be  made  if  i  is 
scheduled  later  in  SA  than  in  SB  (see  Figure  5.2(b)). 

By  performing  the  above  transformation  on  all  jobs  in  A 
n  B,  we  obtain  schedules  SA'  and  SB1  that  contain  no  tardy 
jobs.  In  addition,  jobs  in  A  n  B  are  scheduled  in  the  same 
slots  in  SA’  and  SB' . 

If  | B  j  >  | A | ,  then  there  must  be  job  j  scheduled  in  SB’ 
in  a  slot  that  is  empty  in  SA' .  Also,  j  ^  A.  By  adding  j 
to  A,  we  clearly  obtain  a  feasible  set  with  return  larger 
than  that  obtained  from  A.  This  contradicts  the  assumption 
on  A.  So,  |  B | _<  |  A |  .  C ] 

From  the  sequential  algorithm  for  the  job  sequencing 
problem  and  Theorem  5.1,  we  may  derive  a  parallel  algorithm. 
Let  Tl(i)  =  [jlz.  >  z.  or  (z.  =  z.  and  j  <  i)}  and  T2(i)  = 
ri(  i)  U  (i).  ■’Consider  a-’scheaule  for  Tl(i)  tnat  has  the 
fewest  number  of  tardy  jobs.  Let  x(i)  be  the  number  of  non¬ 
tardy  jobs  in  this  schedule.  Let  y(i)  be  tne  corresponding 
number  for  T2(i).  From  our  discussions,  it  follows  that  job 
i  will  be  included  in  R  (Figure  5.1)  iff  y(i)  >  x(i). 
Hence,  R  may  be  obtained  by  computing  x(i)  and  y(i),  l<_i<_n. 
x(i)  and  y(i)  may  be  computed  using  the  parallel  algorithm 
for  F(k)  described  in  Section  4.  From  R,  the  optimal 


n  mgm 


Figure  £•£:  Lining  up  common  jobs  . 

schedule  is  obtained  by  scheduling  the  joos  in  R  first,  in 
order  of  due  times;  and  then  scheduling  the  remaining  jobs 
in  any  order.  This  construction  can  be  carried  out  by  first, 
concentrating  tne  jobs  in  R  and  then  sorting  them  by  due 
times . 

Example  _5.1_:  Figure  5.3(a)  shows  an  example  job  sec  with  12 
jobs.  Tnese  have  been  ordered  by  due  times  in  Figure 
5.3(b).  Figure  5.4  gives  T2(i),  li.i£n .  The  number  of  non- 
tardy  jobs  in  the  optimal  schedules  for  Tl(i)  and  T2(i)  is 
respectively  given  in  (x(i),y(i)).  It  also  tells  if  job  i 
is  to  be  included  in  R.  R  is  seen  to  be  (1,  3,  5,  6,  3,  9, 
11,  12}.  These  jobs  may  be  concentrated  to  one  end  to 
obtain  Figure  5.5.  This  gives  tne  optimal  scnedule.  [] 


2  i  3  '  4  i  5  ■  6 


9  !  10  1  11  i  12 
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T2(i)  i  1 

z,=35  d.  2 

1  x  inclads  (3,1) 

V2(2)  il46  10  235789  11  12 

z2=55  d.2223  6656677  8 

reject  (8,8) 

T2 ( 3 )  i  1  4  6  3  5  fl  9  11 

z  =65  d.  22266677 

5  ~  include  (5,7) 

T2(4)  i  1  4  6  11 

z.=80  di  2  2  2  7 

reject  (3,3) 

T2 ( 5 )  i  1  4  6  5  8  9  11 

Ze=70  d-  2226677 

’  1  include  (5,6) 

T2 ( 6 )  i  1  6 

zft=85  d.  2  2 

include  (1,2) 

T2(7)  i  1  4  6  3  5  7  8  9  11 

z7-63  d.  222666677 

1  reject  (7,7) 

r2 ( 8 )  i  1  4  6  8  11 

z,=80  d-  2  2  2  6  7 

include  (3,4) 

T2(9)  i  1  4  6  8  9  11 

Zq=75  d.  222677 

y  1  include  (4,5) 

T2 ( 10 )  i  1  4  6  10  3  5  7  8  9  11 
z  =60  d  2  2  2  3  666677 

1  1  reject  (7,7) 

T2 ( 1 1 )  i  1  6  11 

z  =85  d.  2  2  7 

include  (2,3) 

T2 ( 1 2 )  i  1  4  6  10  3  5  7  8  9  11  12 
z  =60  d.  2223  666677  3 

include  (7,3) 
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Figure  5_-2.  -ft-  optimal  schedule 

As  far  a3  the  complexity  is  concerned,  the  initial  sort  by 
due  times  can  be  done  in  O(logn)  time  using  n  PEs.  Next, 
we  need  to  replicate  this  sorted  data  into  n  copies,  one  to 
be  used  for  eacn  |l(i)  and  T2(i).  This  replication  can  oe 
carried  out  using  a  PSs  and  spending  O(logn)  time  (the 
O^logn)  time  is  needed  to  avoid  read  conflicts).  .'iow,  tne 
n  ?E3  are  divided  into  n  groups  of  n  PEs  each.  Group  i 
computes  Tl(i)  and  then  T2(i).  Tl(i)  is  obtained  oy  naving 
the  jth  PE  in  group  i  flag  job  j  iff  z.  >  z.  or  (z.  =  z.  and 
j<i).  Next,  the  flagged  jobs  are  concentrated  2n  O(iogn) 
time  using  tne  n  PEs  in  each  group.  Note  tnat  this  concen¬ 
tration  preserves  the  due  time  ordering.  Tne  n  PEs  in  group 
i  next  compute  x(i)  =  F(.<),  l<_i£n.  This  takes  O(logn) 
time.  y(i),  l<i<n  is  computed  in  a  manner  similar  to  that 
used  to  obtain  xTi)- 

Having  obtained  x(i)  and  y(i),  n  PEs  are  used  to  deter¬ 
mine  if  y(i)  >  x(i),  l<_i<_n.  The  selected  jobs  can  be  con¬ 
centrated  in  O(logn)  time  using  these  n  PEs.  The  concentra¬ 
tion  preserves  the  due  time  ordering  of  the  selected  jobs. 

The  overall  complexity  of  our  parallel  algorithm  is 
therefore  O(logn).  It  uses  n  PEs  and  has  an  EPU  of  0(l/n). 
This  should  be  contrasted  with  the  algorithm  presented  by  us 
in  [o]  for  the  same  problem.  That  algorithm  has  a  complex¬ 
ity  of  O(log  n)  but  uses  only  O(n)  PEs.  Thus,  its  EPU  is 
0(1/ logn) • 


t5.  £arlmess  and  Tardiness  Penalties 

Let  J  oe  a  set  of  n  jobs.  Associated  witn  each  job  is  a  tar¬ 
get  start  time  a^,  a  target  due  time  b.,  and  a  processing 
time  p^.  Any  one  machine  scnedule  S  for  it  may  be  denoted  by 
a  vector  (s.,s.,...,s  )  where  is  the  start  time  of  job  i. 
Schedule  S  Is  Sdmissaole  iff  s^L>.  sj__i  +  2_<_i<n.  The 

computation  time  c.  of  job  i  is  s(  -*•  p..  The  earliness  e, 
and" tardiness  t^  of  Joo  i  are  given^by: 

e.  *  max ( d , a. -s. } 
t£  *  maxtd.c^b^’j 
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If  job  i  is  early  (i.e.,  e.  >  0)  then  it  incurs  a 

penalty  g(e^).  If  it  is  tardy  (I.e.,  t.>0),  then  it  incurs  a 
penalty  h(tT').  The  objective  is  to  fiAd  a  schedule  S  that 
minimizes  the  maximum  penalty.  This  problem  was  first  stu¬ 
died  by  Sydney  [23].  He  obtained  an  0( n)  algorithm  for  the 
case  wnen: 

(1)  implies  bi<j bj 

and 


(2)  G(  )  and  h(  )  are  monotone  nondecreasing  continuous 
functions  such  that  g(3)  =  h(3)  =  3. 

Our  notations  and2definitions  are  taken  from  Sidney's 
paper.  Sidney's  0(n  )  algorithm  was  subsequently  improved 
to  O(nlogn)  by  Lakshminarayan  et  al.  The  parallel  algorithm 
we  shall  develop  here  is  based  on  the  algorithm  of  Lakash- 
minarayan  et.  al  [16]. 

The  algorithm  of  [16]  first  finds  an  admissable 
schedule  S  using  procedure  ADM IS  (Figure  5.1).  This  pro¬ 
cedure  assumes  that  the  jobs  are  ordered  by  target  start 
times  (i.e.,  ai-iai+i^  an<^  within  start  times  by  target  due 
times  (  i  . e . ,  a^=al+1  impl ies  )  .  The  maximum  lateness, 

A,  in  S  is  next  computed.  If  A-0*  then  S*is  clearly  optimal 
(as  max{ e . }=max{ t^}=3 ) .  If  A  >  then  E  is  computed  using 
one  of  tAeir  Jemmas.  Finally,  all  the  start  times  in  S  are 
decreased  by  E  .  The  new  schedule  is  optimal. 


Procedure  ADM IS  (a,p,s,n) 

/ / jobs  are  ordered  by  target  start  and  due  times)// 


declare  n,  a.  p,  s. 
-  i :n,  1 :n,  1 :  n 


Ibr  i 


s . 

end  f Ar 
end  ADM I 3 


2  to  n  do 


Figure  6.1_ 


A  can  be  computed  in  O(logn)  time  using  n  PEs  (see 
[6]).  As  described*in  [16],  E  may  be  computed  in  0(1)  time 
using  1  ?E.  Once  E  has  been  obtained,  n  copies  of  it  can 
oe  made  in  O(logn)  time  using  n  PEs.  Finally,  the  ss  can 
be  updated  in  0(1)  time  using  n  PEs.  Also,  the  initial  ord^ 
ering  of  the  jobs  may  be  carried  out  in  O(logn)  time  with  n^ 
PEs.  All  that  remains,  is  the  computation  of  the  admissable 
schedule.  From  Figure  6.1,  we  see  that 


(S.l) 


i 


mixlai,si-l  +  Pi-1}' 


2  <  i<n 
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Expanding  the  recurrence  (6.1),  we  obtain 
i-1 

(6.2)  s.  =  max  (a.+  Z  p,  },l<i<n 

l_Sj±i  3  ’<=j 

3 

It  snould  be  easy  to  see  that  using  (6.2)  and  0(n  ) 
PEs,  one  can  compute  all  the  s.s  in  O(logn)  time.  We  shall 
devote  the  remainder  of  the  section  to  the  development  of  an 
O(logn)  algorithm  that  utilizes  only  n/2  PEs.  As  we  shall 
see  later,  [~ n/logn-)  are  all  that  is  needed. 

For  convenience,  we  shall  assume  that  the  jobs  are 
indexed  0,1,..., n-1  rather  than  l,2,...,n.  Before  describ¬ 
ing  the  algorithm,  we  develop  some  terminology.  Let  S(0:n- 
1)  oa  an  array.  A  2  -block  of  S  consists  of  all  elements  of 
5  whose  indices  differ  only  in  the  least  significant  k  bits. 
The  2  -blocks  of  A(3:10)  are  [0,1],  [2,3],  [4,5],  [6,7], 
[8,9],  and  [10];  the  2  ^blocks  are  [0,1, 2,3],  [4, 5, 6, 7],  and 
[8,9,13];  ejt£^  Two  2  -blocks  are  sibling  blocks  iff  their 
union  is  a  2*  -block.  Thus,  [0,1]  and  i_2,3"]  alre  sibling 
blocks;  so  also  are  [3, 1,2, 3]  and  [4, 5, 6, 7].  However,  [2,3] 
and  [4,5]  are  not  sibling  blocks. 

Let  A(0:n-1)  and  P(3:n-1)  be  the  target  start  times  and 
the  processing  times.  Let  [i,  i+1,  i*2,...,r]  be  the  index 
set  for  any  -block  (a  2  -block  has  2K  indices  unless  it  is 
the  last  2  -block) .  With  respect  to  this  2  -blocx,  we 
define 

j-1 

S(j)  =  Z  P(q) ,  j  is  an  index  in  this  block 
q=i 
r 

(6.3)  T ( j )  =  Z  P(q),  r  is  the  highest  index  in  the  block 

q=i 

j-i 

Q(j)  =  max  { A ( q ) +  Z  P(t)}  ,j  is  a  block  index 
i£q<_j  t=q 

U(j)  =  Q(r)  +  P(r),  j  is  a  block  index 
0 

For  a  2  -block  [i],  we  have; 

(6.4)  S(i)  =  0;  T ( i )  =  P(i);  Q(i)=A(i);  U(i)=A(i) 


Let  [i,i+L,...,u]  and  Y  =  [u+l,...,v]  be  two 

skiing  2  -blocks.  Their  union  Z  =  [i,  i+1 ,  .  .  . ,  v]  is  a 
2  -block.  Let  S,  T,  Q,  and  U  be  the  values  defined  in 
(6.3)  with  respect  to  tne  2  -blocks.  Let  S',  T'  ,  ,  Q’  ,  and  LJ ' 
oe  the  values  defined  with  respect  to  the  2X  -block  Z. 
From  (6.3),  we  see  that; 

j  SC  j)  if  j  .  X 

I  S(j)+T(i)  if  j  «  Y 


(6.5a)  S ' ( j ) 
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(5.5b)  T'(5)-j 


T( j)+T(u+l) 
T( j)+T(i) 


if  j  «  X 
if  j  *  Y 


|  Q(  j  )• 

(5 . Se)  Q*  C  j)-(  max{Q(j),  u(i)+s(j)} 


if  j  <  X 
if  j  *  Y 


(6.5d)  U'(j)=  Q ' { v) +P{ v) 


One  also  notes  that  with  respect  to  the  entire 
r  log2n"1 

2  -block  £0,1,^,  •>,  . . .,n-l], 

Q(j)  *  max  [a  +  >  p(t)} 

0<_q<_j  ^  t=q 

*>  S j  of  (6.2) 

Our  strategy  is  to  compute  the  admissable  schedule 
obtained  oy  procedure  ADMIS  by  using  (6.5  a-d) .  We  start 

with  the  S,  T,  Q,  and  U  values  for  2  -blocks  as  given  by 

(6.4).  Next  using  (6.5  a-d),  the2S,  T,  Q  and  U  values  for 
2  -blocks  are  obtained;  then  for  2  -blocks,  then  for  2  - 
blocks;  etc.  Until  we  have  obtained  the  Q  values  for  tne 

r log-nl 

entire  2  -block. 

Example  6_._1 :  Figure  6.2  gives  a  set  of  Id  jobs  (indexed  0 

tnrough  9).  The  first  row  of  Figure  6.3  gives  the  S,  T,  Q, 

and  U  values  for  the  2  -blocxs;  etc.  The  numbers  with 
arrows  give  PE  assignments.  From  the  bottommost  row,  we 
obtain  a* (0, 3 , 5 , 3 , 12 , 13 , 16, 20, 21 , 24 )  as  the  admissaole 
schedule.  [] 
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An  Example  of  data  sat 


Let  us  now  proceed  to  the  formal  algorithm.  In  the 
actual  algorithm,  processors  are  assigned  to  compute  the  new 
values  of  S,  T,  Q,  and  U.  Assume  that  the  PEs  are  indexed 
0, 1,  •  •  • ,  Ln/2 J  -1 •  With  respect  to  our  example  of  Figure 
6.3,  when  k*0,  PS(0)  will  compute  the  new  values  of  3(1), 
T( 3 ) ,  T ( 1 ) ,  Q( 1 ) ,  U(3),  and  U(l);  PE(1)  will  compute  3(3), 
T( 2 ) ,  T( 3 ) ,  Q ( 3 ) ,  U ( 2 ) ,  and  U(3);  etc.  When  k*l,  PEs  0  and  1 
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Figure  6 . 3_  Computing  tne  admissible  scnedule 

2 

are  botn  assigned  to  the  new  2  -clock  C '3, 1,2, 3],  being  con 
structed.  ?Ss  2  and  3  are  assigned  to  the  block  [4,5,6,7j 
?E3  4  and  5  are  idle. 


2  a 


Let  .  •  .i-j»  i~»  i^  »  i-Q  be  the  binary  representation  of  i. 
The  PE  assignment  rule  is  obtained  by  defining  the  function 
f  ( i#  j  )**•  •  •  i  j+1  »  i  jd  ij_^...  i^ .  When  2  -blocks  are  being 

combined,  PE(i)  computes  S(f(i,k)+2K),  T(f(i,k)J, 
T(f(i,k)+2*),  Q(f(i,k)+  2*) ,  U(f(i,k)),  and  U(f(i,k)+  2*) 

(provided  of  course  that  all  these  indices  are  less  than  n) . 
The  formal  algorithm  is  given  in  Figure  6.4.  This  algorithm 
mirrors  equations  (6.5  a-d) .  Some  minor  modifications  have 
however  been  made.  Since  T(  )  is  the  same  for  all  indices  in 
a  2  -block,  S(j)+r(i)  of  (6.5a)  has  been  replaced  by 
S(j)+T(j-2  ).  Similarly,  T(j)+T(u+1)  has  been  replaced  by 
T( j )+T( j+2  )?  and  U(i)+5(j-l)  by  U( j-2K)+S ( j ) .  Note  that  as 
a  result  of  tnis  change,  new  T  and  U  ^/alues  for  tne  rignt 
most  block  may  be  incorrect  (as  j+2*  may  exceed  n-1).  This 
does  not  affect  tne  outcome  of  the  algorithm  as  the  T  and  U 
values  of  riantmost  blocks  are  never  used.  One  may  verify 
that  max{U( j+2  ; , U( j )+T ( j+2  ) =0 1 ( v) +P( v)  (Eq.  6.5d).  When 
k=[_lognJ  -1,  only  Q  need  be  computed. 

procedure  PADMI5 ( A, P, s , n) 

//obtain  the  admissable  schedule(s.  s?  ...,3  )// 
declare  n,  A( 0 :n-l ) , P( 0 :n-l ) , S ( 3 :n=l ) ft (0 :n-?) 
delcare  Q(0:n-l),U(0:n-l),j,i 
for  each  PE(i)  do  in  parallel 
j  *-  f(i,0)  0 

//initialize  2  -blocks// 

S(j)  <—  0 ;  T  (  j  )  <—  P  (  j  )  ?  Q  (  j  )  <—  A(  j  )  ;  U  (  j  )  «—  A(  j  )  +P  (  j  ) 

S (  j  +  1 )  < —  3;  T(  j+1)  *-P(  j  +  1) 

Q(  j  +  l)_<-A(  j  +  l)?U(  j  +  1)  < —  A(  j+l)+P( j+1) 
for  k +— 0  to^^log^nl  — 

// combine  2  -blocks// 
j  <—  f(i,k)  //PE  assignment// 
if  j+2  <n ..then 

Q(  j+2*T+^max{Q(  j+2*)  ,  U(  j)+S(  j+2*)  } 

U(j+2*)  «—  maxi U ( j+2* ) ,  U( j )+T( j+2*) } 

U(j)  k<-U(j  +  2*) 

S( j+2*)  < —  5(  j+2  * ) +T ( i ) 

T(j+2*)  < —  T  ( j ) +T ( j  +  2  * ) 

T  (  j  )  <—  T(  j  +  2*) 
endif 
end  for 
and  for 

«— ~Q(  i ) ,  0£i<n 

end  PADMIS 

Figure  6_-4:  Parallel  admissable  schedule  algorithm. 


The  complexity  of  PADMIS  is  readily  seen  to  be  O(logn). 
It  uses  n/2  PE3 .  By  dividing  the  jobs  into  F n/logn~| 
groups,  each  of  size  at  most  logn,  it  is  possible  to  compute 
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zhe  s^3  in  O(logn)  time.  This  requires  combining  tne  sequen¬ 
tial  and  parallel  algorithms  together.  We  omit  tne  details. 
However,  tnis  grouping  technique  has  been  used  in  otner 
problems.  The  details  can  be  found  in  [6].  Witn  this 
grouping  tecnnique,  the  parallel  admissable  scnedule  algo¬ 
rithm  will  nave  an  EPU  of  0(1). 


The 
minimize 
the  sort 
n  PEs. 


overall  complexity  of  the  parallel  algorithm  to 
earliness  and  tardiness  penalties  is  determined  by 
(to  order  jobs).  This  taices  O(logn)  time  and  uses 
The  EPU  is  0(l/n) . 


1_ .  Cnannel  Assignment 

The  channel  assignment  problem  occurs  naturally  as  a  wire 
routing  problem.  Components  of  an  electrical  circuit  are 
laied  out  in  a  straight  line  a3  in  Figure  7.1.  Certain 
pairs  of  components  are  to  be  connected  using  only  two  vert¬ 
ical  runs  and  one  horizontal  run  of  wire  (as  in  Figure  7.1). 
The  horizontal  and  vertical  runs  are  pnysically  located  in 
different  layers.  Each  horizontal  run  of  wire  lias  in  a 
'cnannel' .  Ho  channel  can  simultaneously  carry  more  than 
one  wire.  We  are  required  to  assign  the  norizontal  wire 
runs  to  channels,  using  the  least  number  of  cnannsls.  The 
assignment  of  Figure  7.1  uses  3  channels. 


channel  3 
channel  2 
channel  1 

components 


Figure  7_.1_: 


Wiring  witn  3 


channels . 


In  the  mathematical  formulation  of  this  problem,  we  are 
given  n  pairs  of  points  (a^,b^),  l_<i<_n.  Eacn  pair  (a. ,b. ) 
is  to  be  joined  by  a  continuous  horizontal  run  of  wire. 
These  wires  are  to  be  assigned  to  channels,  in  such  a  way 
that  the  number  of  channels  used  is  minimum.  In  the  example 
of  Figure  7.1,  n=4;  the  pairs  of  points  are  (1,4),  (2,5), 
(3,7),  and  (6,3);  the  channel  assignment  is:  (1,4)  and  (5,3) 
in  channel  1,  (2,5)  in  channel  2,  and  (3,7)  in  channel  3. 

The  joD  sequencing  problem  with  release  times  and  due 
times  [3]  is  similar  to  the  channel  assignment  problem. 
Suppose  we  are  given  a  3et  J  of  n  jobs.  Associated  with 
eacn  joo  is  a  release  time  r  ,  a  due  time  d^,  and  a  process¬ 
ing  time  p^.  A  feasible  schedule  is  one  in  whicn  no  job  is 
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processed  before  its  release  time?  all  jobs  complete  by 
their  respective  due  times;  and  jobs  are  processed  without 
interruption  from  start  to  finish.  We  are  required  to  find  a 
feasible  schedule  that  uses  the  fewest  number  of  machines. 
One  readily  sees  that  when  r.+p.=d.,  l£i<n,  this  problem  is 
identical  to  the  channel  assignment  problem.  When  this  res¬ 
triction  on  r.,  p.,  and  d.  is  removed,  the  problem  is  NP- 

nard.  1 

The  fastest  sequential  algorithm  known  for  the  channel 
as.- Lgnment  proDlem  is  due  to  Gupta,  Lee,  and  Leung  [9J. 
This  algorithm  runs  in  O(nlogn)  time  and  consists  of  the 
following  steps: 

step  1:  Sort  the  multiset  {a^  I  l£i£n}  U{  b  .  I  l_<i<_n} 

of  the  2n  end  points1into  nondec^easing  order. 
step  _2:  m «—  <5;  stack  < —  empty 
step  T:  process  the  2n  points  one  by  one 

if  the  point  being  processed  is  an  a^ 
then  if  stack  empty  then  m «—  m+1 

assign  this  wire  run  to  cnannel  m 
else  unstack  a  channel  from 

the  stack  and  assign  the  wire  to  this 
channel 

end  if 

else  put  the  channel  used  by  this  wire  onto  the 
stack 

endif 


In  the  above  three  step  algorithm  of  [9],  the  final 
value  of  m  is  the  fewest  number  of  channels  needed.  The 
assignment  is  constructed  while  this  number  is  being  deter¬ 
mined.  It  is  possible  to  determine  this  number  without 
actually  obtaining  a  channel  assignment.  Let  c^,  c2»  .  .., 
c_n  be  the  sorted  sequence  of  2n  end  points.  Set  z.=l  if  c. 
is  an  a,  and  z;=-l  if  c4  is  a  bj .  It  is  easy  to  see  tna£ 


r  .  =  >  z . 

3  i-1  A 

or  cross 


i  ii 

gives  the  number  of  wires  that  either  start  at  c^ 

the  point  c- .  Further,  max  1 r . }  is  the  number  of 

1< j  <2n  3  . 


channels  needed  to  route  the  n  wire  segments. 


r.,  l<i<n  can  be  computed  using  the  partial  sums  algo¬ 
rithm  ^of  [6].  This  algorithm  takes  O(logn)  time  and  uses 
r*n/logn~!  PE3 .  The  largest  r.  can  be  found  in  O(logn) 
time  using  ["“n/logn-!  PEs .  The  initial  ordering  of  the  as 
and  bs  can  oe  done  in  O(logn)  time  using  n  PEs.  If  this 
sorting  algorithm  is  used,  the  resulting  parallel  algorithm 
to  determine  tne  fewest  number  of  channels  haq  a  time-  com¬ 
plexity  of  O(logn)  and  an  EPU  of  0(l/n).  If  the  0{log  n),  n 
PE  sorting  algorithm  of  [21]  is  used  instead,  the  time  com¬ 
plexity  is  O(log^n)  and  the  EPU  is  O(l/logn). 
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Example  2.‘i:  Figure  7.2  gives  a  set  of  n  wires.  Figure  7.3 
shows  tne  results  of  tne  different  steps  of  tne  parallel 
algorithm  to  determine  the  fewest  number  of  channels  needed. 
This  numoer  is  4.  [] 
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Figure  7_.2_ 
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Figure  7_,3_ 


Fhe  actual  channel  assignment  can  be  obtained  from 


tne 


r^s  (recall  tnat  r^  =  l  z^)  ,  l<_j<_2n.  Assume  tnat  c^ 

corresponds  to  a^ .  Let^q^be  the  largest  index  such  that 
q<  j ,  r  =r.-l,  and  c  corresponds  to  an  a  (say  a  ).  If  no 
sucn  q  exists,  set  q  to  3.  An  examination  of  the  ^algoritnm 
of  dupta  st  al .  reveals  that  if  q=3 ,  then  the  channel  used 
oy  (a,  ,  b  )  has  not  been  used  earlier.  If  q  ,  than  it  was 
most  recently  used  in  the  interval  (a  , b  ).  To  see  the 
truth  of  tnis,  note  that  at  point  'oQ,  the  ?chlnnel  assigned 
to  (a  ,b  )  is  put  into  the  stack.*  This  cnannel  remains  m 
tne  3tacx*  until  we  reach  the  nearest  point  at  which  tne 
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number  of  wires  that  start  or  cross  is  one  more  than  the 
number  at  bQ  (if  a.=a.  and  i<j,then  we  say  that  a^  is  before 
a.).  For  ?every  j  such  that  c.  is  an  a  point,  let  L(k)  =  p 
as  defined  above.  ^ 

L(j)  partitions  the  set  of  n  wires  into  sets.  Figure 
7.4  gives  the  paritioning  for  the  example  of  Figure  5.3. 
Each  wire  is  represented  by  a  circle.  The  circle  with  index 
i  inside  it  represents  the  wire  (a^,  b^).  Mj)  may  be 
interpreted  as  a  left  link.  Figure  7.4  shows  the  paritions 
as  linked  lists  with  L{  )  being  shown  as  a  leftward  arrow. 
We  leave  it  to  the  reader  to  see^how  the  L(  )  values  may  be 
obtained  in  O(logn)  time  using  n  / logn  ?Es . 


d> - © 

d> - © 

d> - 0 

- ® 

Figure  7_.4:  Paritions  for  Example  7.3 


The  channel  assignment  Q(k)  for  a  wire  k  with  L(k)  =  3  is 

obtained  from  the  r  value  corresponding  to 

If  L(k)  3,  we  may  initially  set  Q(  k)=3.  The  actual 
channel  assignments  for  wires  with  L(k)  £  3,  may  be  obtained 
oy  simultaneously  collapsing  the  linked  lists  and  transmit¬ 
ting  the  channel  assignment  within  the  lists  as  below: 
for  j  1  to  r  lognH  do 

for  eacn  i  for  which  Q(i)=0  do  in  parallel 
if  L(L( i) )—3  then  Q(i)  <—  Q ( L ( i )  ) 

ZTTi)  <—  L(L(i)  ) 
end  for 
and  for 


The  parallel  complexity  of  the  above  scheme  is  O(logn). 
Therefore,  the  overall  complexity  of  our  parallel  channel 
assignment  algorithm  is  O(logn)  (i.e.,  using  the  O(logn)  n 
PE  sorting  algoritnm);  its  EPU  is  0(l/n). 


Conclusions 


d . 

The  axtant  to  wnicn  parallel  computers  will  find  application 
will  depend  largely  on  our  aoility  to  find  efficient  algo¬ 
rithms  for  them.  In  tnis  paper  we  have  examined  several 
scheduling  problems.  Tne  single  processor  algorithm  for 
eacn  of  tnese  appeared  to  be  highly  sequential  in  nature.  A 
closer  look  revealed  a  parallel  structure  tnat  led  to  effi¬ 
cient  parallel  algorithms.  Several  otner  scneduling  prob¬ 
lems  can  be  solved  efficiently  using  the  tecnniques  of  this 
paper  and  of  [22 j. 

Some  examples  are: 

(a)  2  machine  flow  shop  scheduling  to  minimize  finisn 

time . 

(d)  2  machine  open  shop  scheduling  to  minimize  finish 

time 

(c)  2  machine  flow  shop  scheduling,  with  no  wait  in 

process,  to  minimize  finish  time 


The  parallel  algorithms  for  the  aoove  problems  involve 
a  ratner  straigntforward  application  of  parallel  sorting  and 
partial  sums.  For  example,  consider  problem  (a).  Hare,  we 
simply  divide  the  job  set  into  two  classes:  (i)  jobs  which 
need  lass  time  on  machine  1  than  on  2  (ii)  remaining  joDs. 
Jobs  in  (i)  are  sorted  into  nondecreasing  order  of  their 
machine  1  processing  times.  Jobs  in  (ii)  are  sorted  inco 
nondecreasing  order  of  their  machine  2  processing  time.  Tne 
optimal  processing  permutation  consists  of  jobs  in  (i)  in 
sorted  order  followed  by  those  in  (ii)  in  sorted  order.  One 
readily  sees  tnat  this  permutation  satisfies  Jacxson's  rule 
'  5]. 
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